26 research outputs found

    On the Importance of Registers for Computability

    Full text link
    All consensus hierarchies in the literature assume that we have, in addition to copies of a given object, an unbounded number of registers. But why do we really need these registers? This paper considers what would happen if one attempts to solve consensus using various objects but without any registers. We show that under a reasonable assumption, objects like queues and stacks cannot emulate the missing registers. We also show that, perhaps surprisingly, initialization, shown to have no computational consequences when registers are readily available, is crucial in determining the synchronization power of objects when no registers are allowed. Finally, we show that without registers, the number of available objects affects the level of consensus that can be solved. Our work thus raises the question of whether consensus hierarchies which assume an unbounded number of registers truly capture synchronization power, and begins a line of research aimed at better understanding the interaction between read-write memory and the powerful synchronization operations available on modern architectures.Comment: 12 pages, 0 figure

    How Many Cooks Spoil the Soup?

    Get PDF
    In this work, we study the following basic question: "How much parallelism does a distributed task permit?" Our definition of parallelism (or symmetry) here is not in terms of speed, but in terms of identical roles that processes have at the same time in the execution. We initiate this study in population protocols, a very simple model that not only allows for a straightforward definition of what a role is, but also encloses the challenge of isolating the properties that are due to the protocol from those that are due to the adversary scheduler, who controls the interactions between the processes. We (i) give a partial characterization of the set of predicates on input assignments that can be stably computed with maximum symmetry, i.e., Θ(Nmin)\Theta(N_{min}), where NminN_{min} is the minimum multiplicity of a state in the initial configuration, and (ii) we turn our attention to the remaining predicates and prove a strong impossibility result for the parity predicate: the inherent symmetry of any protocol that stably computes it is upper bounded by a constant that depends on the size of the protocol.Comment: 19 page

    Fast Approximate Counting and Leader Election in Populations

    Get PDF
    We study the problems of leader election and population size counting for population protocols: networks of finite-state anonymous agents that interact randomly under a uniform random scheduler. We show a protocol for leader election that terminates in O(logm(n)log2n)O(\log_m(n) \cdot \log_2 n) parallel time, where mm is a parameter, using O(max{m,logn})O(\max\{m,\log n\}) states. By adjusting the parameter mm between a constant and nn, we obtain a single leader election protocol whose time and space can be smoothly traded off between O(log2n)O(\log^2 n) to O(logn)O(\log n) time and O(logn)O(\log n) to O(n)O(n) states. Finally, we give a protocol which provides an upper bound n^\hat{n} of the size nn of the population, where n^\hat{n} is at most nan^a for some a>1a>1. This protocol assumes the existence of a unique leader in the population and stabilizes in Θ(logn)\Theta{(\log{n})} parallel time, using constant number of states in every node, except the unique leader which is required to use Θ(log2n)\Theta{(\log^2{n})} states

    Lock-Free algorithms under stochastic schedulers

    No full text
    In this work, we consider the following random process, motivated by the analysis of lock-free concurrent algorithms under high memory contention. In each round, a new scheduling step is allocated to one of n threads, according to a distribution p = (p1, p2, ..., pn), where thread i is scheduled with probability pi. When some thread first reaches a set threshold of executed steps, it registers a win, completing its current operation, and resets its step count to 1. At the same time, threads whose step count was close to the threshold also get reset because of the win, but to 0 steps, being penalized for almost winning. We are interested in two questions: how often does some thread complete an operation (system latency), and how often does a specific thread complete an operation (individual latency)? We provide asymptotically tight bounds for the system and individual latency of this general concurrency pattern, for arbitrary scheduling distributions p. Surprisingly, a simple characterization exists: in expectation, the system will complete a new operation every Θ(1 / |p|2) steps, while thread i will complete a new operation every Θ(|p|2 / pi2) steps. The proof is interesting in its own right, as it requires a careful analysis of how the higher norms of the vector p influence the thread step counts and latencies in this random process. Our result offers a simple connection between the scheduling distribution and the average performance of concurrent algorithms, which has several applications

    Quantized stochastic gradient descent: communication versus convergence

    No full text
    Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Consequently, lossy compression heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always provably converge, and it is not clear whether they are optimal. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes which allow the compression of gradient updates at each node, while guaranteeing convergence under standard assumptions. QSGD allows the user to trade off compression and convergence time: it can communicate a sublinear number of bits per iteration in the model dimension, and can achieve asymptotically optimal communication cost. We complement our theoretical results with empirical data, showing that QSGD can significantly reduce communication cost, while being competitive with standard uncompressed techniques on a variety of real tasks

    Generating Fast Indulgent Algorithms

    Get PDF
    10.1007/s00224-012-9407-2Theory of Computing Systems514404-42

    How to allocate tasks asynchronously

    Get PDF
    10.1109/FOCS.2012.41Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS331-34

    Optimal-time adaptive strong renaming, with applications to counting

    Get PDF
    10.1145/1993806.1993850Proceedings of the Annual ACM Symposium on Principles of Distributed Computing239-24885LR
    corecore